December 5, 2016

Identifying Statistical Patterns in Educational Inequality

Available Data

Source: College Scorecard, a federally maintained dataset

Outcome Statistics by Cohort:

  • Completion Rates disaggregated by Race (percent)
  • Post-Graduation Earnings disaggregated by Household Income Terciles (in dollars)

Explanatory Statistics for each school:

  • Type (Public, Non-Profit Private, For-Profit Private) (coded 1, 2, 3)
  • Instructional Expenditures per Full-time Student (in dollars)

Questions Motivating our Approach

Are "outcome gaps" between groups related to the type of school or expenditures?

  • Compute gap metrics as percent difference in outcomes

  • Conduct hypothesis testing

Are other variables related to outcome gaps?

  • Run Random Forest with additional variables for insight on relative importance

Exploratory Data Analysis

Completion Rates by Race

wb cohort

wb cohort

Trend looks generally negative, but with a lot of variance

Completion Rates by Race

wb cohort

CSUs have significant completion rate gaps, but UCs and private elites have similar outcome gaps despite differences in expenditures.

Earnings Gaps by Household Income

wb cohort

wb cohort

Interestingly, the gap between middle income and high income students seems to worsen with higher expenditures.

Earnings Gaps by Household Income

wb cohort

Negative values indicate that low income students actually tend to outperform middle income students

Completion Rates: White-Black

wb cohort

wb cohort

The data looks fairly random. Possibly a negative trend, but fairly weak-looking.

Completion Rates: White-Hispanic

wb cohort

wb cohort

Similar patterns. It does seem that at higher expenditures, there is less variance in completion gaps.

Completion Rates: White-Asian

wb cohort

wb cohort

There definitely seem to be differences in means between the three groups, but the distributions are similar.

Post Graduation Earnings: High-Low

wb cohort

wb cohort

This correlation for high-low income earnings gaps looks much stronger.

Post Graduation Earnings: High-Mid

wb cohort

wb cohort

Correlation with high-middle income gaps is less dramatic but still seems to have some negative trend.

Post Graduation Earnings: Mid-Low

wb cohort

wb cohort

This pattern seems fairly flat, maybe a slight positive trend.

Formal Analysis: ANOVA

White-Black Completion ANOVA

wb cohort

Little evidence of significance for the difference in white-black completion rate gap between types of schools.

White-Hispanic Completion ANOVA

wb cohort

Here we see evidence that for-profit schools have significantly different white-hispanic completion gaps.

White-Asian Completion ANOVA

wb cohort

We see a similar pattern with the white-Asian completion gap difference.

High-Low Earnings ANOVA

wb cohort

Differences looks more significant here.

High-Mid Earnings ANOVA

wb cohort

Less significant in comparison to High-Low.

Mid-Low Earnings ANOVA

wb cohort

Differences are less significant. We do see the trends from the past two plots reverse.

Formal Analysis: Correlations

Linear Models

Hypothesis: coefficients regressing outcome gaps on expenditures will be negative

Completion Rate Gaps

wb cohort

Slopes are mostly insignificantly different from zero. Many are positive - Hypothesis rejected

Earnings Gaps

wb cohort

Extremely significant negative coefficient for high-low, others less significant.

Random Forest Exploration of other Variables

Overview

Consider relationships and relative importance of other possibly relevant variables, relating to:

  • Location
  • Student Body Racial Diversity
  • Student Body Economic Distribution
  • Selectivity
  • Type of Degree Awarded

Performance

Generally performed poorly in terms of proportion of variance explained

wb cohort

Completion Rates

wb cohort

INEXPFTE is quite important compared to other terms. Diversity also seems relevant.

Completion Rates

wb cohort

INEXPFTE is important again. Economic variables are more important in these models.

Conclusions

Addressing Motivating Questions

  • Completion Rate differences across Races did not seem to correspond strongly with type of school or expenditures

  • Signficant evidence of Earnings Gaps across economic backgrounds corresponding with expenditures

  • No accounting for multiple testing

  • Of many possible variables, Random Forest analysis suggests INEXPFTE is more important than others, but CONTROL is unimportant